CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
Resource Scheduling Problem in Hadoop
Project for Algorithm Design and Analysis
Department of Computer Science and Engineering,
Shanghai Jiao Tong University, Shanghai, China
Abstract. The course project focuses on resource scheduling problem in Hadoop. This document
introduce the background and two versions of resource scheduling on single or multiple hosts. Also,
it lists all the tasks and requirements for each student group. Please read this document carefully
and complete the corresponding tasks.
Keywords: Distributed Computing System, Resource Scheduling, Hadoop
1 Background and Motivation
Hadoop is an open-source software framework for storing data and running applications on clusters
of commodity hardware. In Hadoop, the data of a running job can be divided into several data blocks,
stored on dierent hosts. Each host has one or multiple CPU cores to process data blocks. In this project,
our goal is to achieve eective parallel computing. Say, given a set of jobs with massive data, we need to
design a resource scheduling algorithm to minimize the overall executing time of all jobs.
2 A Simplied Version with Single Host
First, let us consider a simple case with a single host storing all data blocks. Please read the assump-
tions, specications, symbol notations, constraints, and explanations in Subsection 2.1 carefully and then
complete the tasks mentioned in Subsection 2.2.
2.1 Problem Formulation and Explanation
To simplify the problem, we give the following assumptions and specications.
1. There are
n
jobs that need to be processed by a single host, which has
m
CPU cores of the same
computing capability. Let J be the set of jobs, and C be the set of cores for the host, where J =
{job
0
, job
1
, · · · , job
n1
}, and C = {c
0
, c
1
, · · · , c
m1
} (The labels of variables start from 0 because
we use C/C++ source codes in the following tasks).
2. We treat data block as the smallest indivisible unit in our project, while a job can be divided into
multiple data blocks with dierent sizes for storage and computing. Assume job
i
is split into n
i
data
blocks, denoted by B
i
= {b
i
0
, b
i
1
, · · · , b
i
n
i
1
}. For block b
i
k
of job
i
, dene its size as size(b
i
k
).
3. Assume job
i
is assigned to e
i
cores for processing, and naturally e
i
n
i
. That is to say, one core
can process multiple data blocks of one job sequentially. Let
B
i
j
B
i
denote the set of data blocks
of job
i
allocated to core c
j
, and B
i
j
B
i
j
= if j ̸= j
(they should be disjointed).
4. For job
i
, the processing speed of its data blocks is s
i
when job
i
is assigned to a single core. However,
when multiple cores process job
i
in parallel, the computing speed of each core all decays because of
some complicated interactions. We formulate such speed decay eect caused by multi-core compu-
tation as a coecient function g(·) with respect to core number e
i
, as described in Equation (1):
g(e
i
) = 1.00 α · (e
i
1), for 1 e
i
10, (1)
where α is a decay factor satisfying 0 < α < 1 , and usually the number of cores for processing a
single job is no more than 10. Then, the speed of each core can be rewritten as s
i
· g(e
i
) for job
i
respectively. (Note that although the speed of each core decays, the overall processing time using
e
i
cores in parallel should be faster than that of using just one core. Otherwise we do not need to
implement parallel computing. Thus the setting of α should guarantee this principle.)
Correspondingly, the processing time tp
i
j
of core c
j
for job
i
can be expressed as Equation (2):
tp
i
j
=
P
b
i
k
B
i
j
size(b
i
k
)
s
i
· g(e
i
)
. (2)
Page 1 of 7
CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
5. For consistency issues, if we assign a job to multiple cores, all cores must start processing data blocks
at the same time. If one or several cores are occupied by other aairs, then all other cores should
wait for a synchronous start, and keep idle. It means that the processing of job
i
for every core should
all start at time t
i
, whereas their processing duration might be dierent. Let tf
i
j
be the nishing
time of core c
j
for job
i
, which is calculated by Equation (3):
tf
i
j
= t
i
+ tp
i
j
. (3)
However, the occupied cores of job
i
are released synchronously when the computing process of the
last data block is nished. Thus the nishing time tf(job
i
) of job
i
is given as Equation (4):
tf(job
i
) = max
c
j
tf
i
j
, for c
j
C. (4)
Please keep these assumptions and specications in mind and nish the following tasks.
2.2 Task 1: Resource Scheduling for Single Host
Based on the descriptions in Subsection 2.1, your work is to design a resource scheduling algorithm
to minimize the overall nishing time of all jobs, whose objective function is shown as:
min max
job
i
tf(job
i
), for job
i
J.
Figure 1 illustrates a toy example of scheduling resources among three jobs job
0
, job
1
, job
2
on a
single host, which has four cores c
0
, c
1
, c
2
, c
3
. All blocks of job
0
, job
1
, and job
2
are stored on this host,
with dierent sizes measured by megabyte (MB). The goal is to nd a scheduling strategy assigning data
blocks to suitable cores so that the overall nishing time of three jobs is minimized.
job
0
job
1
Host h
c
0
c
1
c
2
c
3
job
2
10MB
b
0
b
0
b
0
b
0
20MB
b
1
b
0
b
1
b
0
16MB
b
2
b
0
b
2
b
0
9MB
b
0
b
1
b
0
b
1
16MB
b
1
b
1
b
1
b
1
10MB
b
0
b
2
b
0
b
2
20MB 15MB
b
2
b
2
b
2
b
2
b
3
b
2
b
3
b
2
6MB
b
1
b
2
b
1
b
2
10MB
b
0
b
0
b
0
b
0
20MB
b
1
b
0
b
1
b
0
16MB
b
2
b
0
b
2
b
0
10MB
b
0
b
0
20MB
b
1
b
0
16MB
b
2
b
0
9MB
b
0
b
1
b
0
b
1
16MB
b
1
b
1
b
1
b
1
9MB
b
0
b
1
16MB
b
1
b
1
10MB
b
0
b
2
b
0
b
2
20MB 15MB
b
2
b
2
b
2
b
2
b
3
b
2
b
3
b
2
6MB
b
1
b
2
b
1
b
2
10MB
b
0
b
2
20MB 15MB
b
2
b
2
b
3
b
2
6MB
b
1
b
2
Fig. 1. A toy example of scheduling re-
sources for a single host
time/s
c
0
is idle
c
2
is idle
c
3
c
2
c
1
c
0
c
1
is
idle
c
3
is
idle
c
1
is idle
c
0
is idle
t=2.128s
b
0
b
0
b
0
b
0
t=2.128s
b
0
b
0
t=3.404s
b
2
b
0
b
2
b
0
t=3.404s
b
2
b
0
t=1.8s
b
0
b
1
b
0
b
1
t=1.8s
b
0
b
1
t=3.2s
b
1
b
1
b
1
b
1
t=3.2s
b
1
b
1
time consumption for
job
1
= 5 s
Time consumption for
job
0
= 4.255 s
t=4.396s
b
2
b
2
b
2
b
2
t=4.396s
b
2
b
2
t=1.319s
b
1
b
2
b
1
b
2
t=1.319s
b
1
b
2
t=3.297s
b
3
b
2
b
3
b
2
t=3.297s
b
3
b
2
time consumption for
job
2
= 4.396 s
5s 9.396s
t=4.255s
b
1
b
0
b
1
b
0
t=4.255s
b
1
b
0
t=2.198s
b
0
b
2
b
0
b
2
t=2.198s
b
0
b
2
Fig. 2. Time consumption of a feasible solution for the example in
Figure 1
Remark. Assume the sizes of three blocks of job
0
are respectively size(b
0
0
) = 10 MB, size(b
0
1
) = 20
MB and size(b
0
2
) = 16 MB, while those of job
1
are size(b
1
0
) = 9 MB and size(b
1
1
) = 16 MB. The size
of each block in job
2
is size(b
2
0
) = 10 MB, size(b
2
1
) = 6 MB, size(b
2
2
) = 20 MB and size(b
2
3
) = 15 MB,
respectively. Moreover, suppose the computing speed is s
i
= 5 MB/s for i = 0, 1, 2, and α = 0.03 to
compute the decay coecient g(·) in Equation (1).
Here we provide a feasible solution (as shown in Figure 2) for the above setting in Figure 1, in which
blocks of job
0
are assigned to three cores c
0
, c
1
, c
2
, and blocks of job
1
are assigned to the last core c
3
.
Page 2 of 7
CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
Now, we can compute the processing time of core c
0
, c
1
, c
2
for job
0
as Equation (5) respectively:
tp
0
0
= size(b
0
0
)/(s
0
· g(e
0
)) = 10/(5 × (1.00 0.03 × (3 1))) = 2.128s,
tp
0
1
= size(b
0
1
)/(s
0
· g(e
0
)) = 20/4.7 = 4.255s,
tp
0
2
= size(b
0
2
)/(s
0
· g(e
0
)) = 16/4.7 = 3.404s.
(5)
Moreover, the processing of job
0
starts at 0s, so the nishing time of job
0
is
tf(job
0
) = 0 + max{tp
0
0
, tp
0
1
, tp
0
2
} = max{2.128, 4.255, 3.404} = 4.255s.
The processing time of blocks b
1
0
and b
1
1
with the single core c
3
is t
1
0
= 9/5 = 1.8s and t
1
1
= 16/5 = 3.2s,
respectively. Then the nishing time of job
1
is tf(job
1
) = 1.8 + 3.2 = 5s.
We assign job
2
to four cores, each of which should start after job
1
releases the occupied core. Obvi-
ously, the processing of job
2
starts at 5s. The processing time of each block of job
2
is:
t
2
0
= size(b
2
0
)/(s
2
· g(e
2
)) = 10/(5 × (1.00 0.03 × (4 1))) = 2.198s,
t
2
1
= size(b
2
1
)/(s
2
· g(e
2
)) = 6/4.55 = 1.319s,
t
2
2
= size(b
2
2
)/(s
2
· g(e
2
)) = 20/4.55 = 4.396s,
t
2
3
= size(b
2
3
)/(s
2
· g(e
2
)) = 15/4.55 = 3.297s.
Then we can get that the nishing time of job
2
as
tf(job
2
) = 5 + max{2.198, 1.319, 4.396, 3.297} = 9.396s.
Thus, the overall nishing time of the three jobs is
max{tf(job
0
), tf(job
1
), tf(job
2
)} = 9.396s.
It is observed that for a multi-core job, its time consumption is determined by the last completed block,
while for the whole system, the total nishing time is decided by the last nished job. However, Figure 2
is just a feasible solution and not necessarily optimal, since cores
c
0
,
c
1
,
c
2
, and
c
3
are all idle for a while
during the whole process, which is a waste of resources.
Based on the above explanations and examples, please nish the following tasks:
1. Formalize the resource scheduling problem for a single host as a programming pattern with objective
function and constraints. You cannot rename the variables that have been dened in the project,
whereas you are free to introduce other new variables, and please dene your variables clearly.
2. Please design an algorithm to solve this problem eciently. First, describe your idea in detail, and
then provide the corresponding pseudocode. Also, please discuss the time complexity of your design.
3. Verify your algorithm using the attached test data in “task1 case1.txt” under input le folder and
save your result in the .txt le, named as “task1 case1 TeamNumber.txt” (e.g., task1 case1 06.txt).
The input/output format is xed in the reference codes. Optionally, visualize your result while the
visual format is not limited. For example, you can plot a gure from the perspective of cores on the
host. The test data and reference codes are also released on GitHub: Resource Scheduling Problem.
3 A Comprehensive Version among Multiple Hosts
In this section, we consider a more complex situation, where we need to schedule resources among
multiple hosts. Now the data transmission process between pairwise hosts should be taken into consid-
eration. The data blocks of jobs could be initially stored on dierent hosts, but one data block can only
be initially stored on one specied host. If data block b
i
k
and its assigned computing core c
j
are not on
the same host, b
i
k
will be transmitted to the host containing c
j
(Here we assume that the bandwidth
between hosts is sucient for data transmission). The transmission process will inuence the nishing
time of jobs, further aecting the resource scheduling process.
Page 3 of 7
CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
3.1 Problem Formulation and Explanation
Besides the descriptions and specications of Task 1, here are more notations and explanations.
1. Assume we have q hosts H = {h
0
, h
1
, · · · , h
q 1
}, and host h
l
has m
l
cores (may have dierent number
of cores). Let C
l
be the set of cores on host h
l
, C
l
= {c
l
0
, c
l
1
, · · · , c
l
m
l
1
}. Easy to see,
P
q 1
l=0
m
l
= m.
2. If core c
l
j
on host h
l
computes a data block b
i
k
of job
i
which is initially stored on another host h
l
,
then b
i
k
needs to be transmitted from h
l
to h
l
at a transmission speed s
t
(this speed is xed in our
system). An example is shown in Figure 3, where hosts h
0
and h
1
both have two cores, and many
jobs need to be processed. Core c
1
0
on host h
1
is assigned to compute the data block b
1
2
of job
1
, which
is initially stored on host h
0
. In this case, b
1
2
needs to be transmitted from h
0
to h
1
at a transmission
speed s
t
rst, and then be computed by c
1
0
. Whenever b
1
2
starts transmission, other cores can work
in parallel to process job
1
.
b
1
b
1
b
1
b
1
job
1
Host h
0
job
0
b
0
b
1
b
0
b
1
b
2
b
2
b
0
b
0
b
0
b
0
b
2
b
0
b
2
b
0
b
1
b
0
b
1
b
0
b
1
b
0
b
3
b
0
b
3
b
0
b
3
b
0
b
1
b
0
b
1
b
0
b
1
b
0
b
3
b
0
b
3
b
0
b
3
b
0
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
2
b
2
b
1
b
1
b
1
b
1
b
1
b
1
b
1
b
1
c
0
c
0
c
0
c
0
c
1
c
0
c
1
c
0
Host h
1
c
0
c
1
c
0
c
1
c
1
c
1
c
1
c
1
b
0
b
0
b
0
b
0
b
2
b
0
b
2
b
0
s
t
s
t
b
0
b
1
b
0
b
1
b
0
b
1
time/s
idle
transmission
from h
0
to h
1
working for other jobs
b
0
b
1
b
0
b
1
time consumption for job
1
c
0
c
0
c
0
c
0
c
1
c
0
c
1
c
0
c
0
c
1
c
0
c
1
c
1
c
1
c
1
c
1
b
2
b
1
b
2
b
1
b
2
b
1
idle
{
{
h
0
h
1
……
……
……
……
……
Fig. 3. An example of data transmission between 2 hosts
3. Any core cannot call for data transmission when it is calculating other data blocks. Likewise, a core
cannot start computing any data block until this block is ready, i.e. initially on the same host or
transmitted from a remote host to the local host. For example, the core c
1
0
on host h
1
must wait for
the data transmission of block b
1
2
from host h
0
to h
1
, and then start computation. What is more, the
transmission time of
b
1
2
from h
0
to h
1
aects the nishing time of job
0
, further aecting the nishing
time of the whole system.
For core c
l
j
on host h
l
, let
e
B
i
lj
be the set of data blocks of job
i
allocated to c
l
j
but not initially
stored on host h
l
. All the data blocks in
e
B
i
lj
need to be transmitted to host h
l
before computing.
Let B
i
lj
be the set of data blocks of job
i
allocated to core c
l
j
. Then, the processing time tp
i
lj
of core
c
l
j
for job
i
can be reformulated as Equation (6):
tp
i
lj
=
P
b
i
k
e
B
i
lj
size(b
i
k
)
s
t
+
P
b
i
k
B
i
lj
size(b
i
k
)
s
i
· g(e
i
)
. (6)
4. If the processing of job
i
starts at time t
i
, then the nishing time of core c
l
j
for job
i
is
tf
i
lj
= t
i
+ tp
i
lj
.
Then the nishing time tf(job
i
) of job
i
is formulated as:
tf(job
i
) = max
c
l
j
tf
i
lj
, for c
l
j
C.
Please keep these assumptions and specications in mind and nish the following tasks.
Page 4 of 7
CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
3.2 Task 2: Resource Scheduling among Multiple Hosts
Similarly, the aim of Task 2 is to design a resource scheduling algorithm among multiple hosts to
minimize the overall nishing time, which is formulated as:
min max
job
i
tf(job
i
), for job
i
J.
Figure 4 shows a toy example of scheduling resources among 3 hosts. We have 4 dierent jobs, each
of which consists of blocks with dierent sizes. For example, job
0
has three blocks, b
0
1
, b
0
1
, and b
0
2
, stored
on host h
0
. Other jobs are stored on dierent hosts. These data blocks are assigned to available cores on
the three hosts, each with two cores. Then this scheduling should consider the data transmission process
if the data block and its allocated core are not on the same host. Our goal is to nd a scheduling strategy
to allocate data blocks to appropriate cores so that the overall nishing time of four jobs is minimized.
Host h
0
c
0
c
0
c
0
c
0
c
1
c
0
c
1
c
0
Host h
1
c
0
c
1
c
0
c
1
c
1
c
1
c
1
c
1
s
t
s
t
job
0
job
1
job
2
10MB
b
0
b
0
b
0
b
0
20MB
b
1
b
0
b
1
b
0
15MB
b
2
b
0
b
2
b
0
12MB
b
0
b
1
b
0
b
1
10MB
b
1
b
1
b
1
b
1
18MB
b
0
b
2
b
0
b
2
20MB
b
1
b
2
b
1
b
2
b
2
b
1
b
2
b
1
24MB
job
3
10MB
b
0
b
3
b
0
b
3
16MB 30MB
b
2
b
3
b
2
b
3
b
3
b
3
b
3
b
3
18MB
b
1
b
3
b
1
b
3
Host h
2
c
0
c
2
c
0
c
2
c
1
c
2
c
1
c
2
10MB
b
0
b
0
b
0
b
0
20MB
b
1
b
0
b
1
b
0
15MB
b
2
b
0
b
2
b
0
12MB
b
0
b
1
b
0
b
1
10MB
b
1
b
1
b
1
b
1
b
2
b
1
b
2
b
1
24MB
18MB
b
0
b
2
b
0
b
2
20MB
b
1
b
2
b
1
b
2
10MB
b
0
b
3
b
0
b
3
18MB
b
1
b
3
b
1
b
3
16MB 30MB
b
2
b
3
b
2
b
3
b
3
b
3
b
3
b
3
s
t
s
t
s
t
s
t
Transmission speed s
t
= 40 MB/s
Computing speed for different jobs:
s
0
= s
1
= 10 MB/s, s
2
= s
3
= 12 MB/s
Fig. 4. A toy example of scheduling resources among 3 hosts
Remark. Assume the sizes of three blocks of job
0
are respectively set as size(b
0
0
) = 10 MB, size(b
0
1
) =
20 MB and size(b
0
2
) = 15 MB, while those of other jobs are listed in the left hand of Figure 4. Set decay
factor α = 0.1, and then the computing decaying coecient is g(e
i
) = 1 0.1 × (e
i
1). Besides, we set
the transmission speed as s
t
= 40 MB/s. Assume that job
0
and job
1
have the same computing speed,
which is s
0
= s
1
= 10 MB/s. Similarly, the computing speed of job
2
and job
3
is s
2
= s
3
= 12 MB/s.
Under the above settings, we give a feasible solution for the example in Figure 4, as shown in Figure 5.
In the initialization phase, we know that blocks of job
0
are on the host h
0
, blocks of job
1
are on h
1
,
blocks of job
2
and job
3
are on h
2
. Thus we choose to compute job
0
, job
1
, and job
2
on their own local
host, each with two cores.
Firstly, the calculation of time consumption is similar to the example of Task 1. For instance, the
time consumption of block b
2
0
, also the processing time of core c
2
0
for job
2
, is
tp
2
21
=
size(b
2
0
)
s
2
× g(e
2
)
=
18
12 × g(2)
=
18
12 × 0.9
= 1.667s.
(7)
As shown in Equation (7), we can compute the time consumption of the rest blocks in job
0
, job
1
and job
2
. One job can be allocated to multiple cores on dierent hosts and each block of this job can
be computed by only one core. We allocate the data blocks of job
2
to cores c
2
0
and c
2
1
for computing. In
detail, c
2
0
computes b
2
0
and c
2
1
computes b
2
1
. Cores c
2
0
and c
2
1
must be released simultaneously when the
computing of b
2
1
is nished. Thus, the nishing time of job
2
in this stage is 1.852s, while c
2
0
and c
2
1
stay
idle to wait for new task allocation.
In the same way, job
0
, job
1
, and job
2
have been allocated to certain cores to nish computing. As
shown in Figure 5, 4 cores on h
1
and h
2
stay in idle state after nishing the computation of c
1
1
. We
consider that allocating job
3
to 4 cores on h
1
and h
2
is feasible.
Page 5 of 7
CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
idle
t=1.333s
time/s
idle
idle
idle
idle
t=1.111s
b
0
b
0
t=2.667s
Time consumption for
job
1
= 2.667 s
Time consumption for
job
0
= 2.778 s
t=1.905s
t
trans
=0.4s
b
2
b
3
Time consumption for
job
3
= 3.571 s
6.238s
t=2.222s
b
1
b
0
h
0
h
1
idle
idle
t=1.667s
t=1.852s
b
1
b
2
t=3.571s
t=2.143s
{
h
2
{
{
c
0
c
0
c
1
c
0
c
0
c
1
c
1
c
1
c
0
c
2
c
1
c
2
t=1.667s
b
2
b
0
b
0
b
1
t=1.111s
b
1
b
1
b
2
b
1
b
0
b
2
idle
b
3
b
3
b
1
b
3
t=1.19s
t
trans
=0.25s
b
0
b
3
transmit
transmit
Time consumption for
job
2
= 1.832 s
Fig. 5. Time consumption of a feasible solution for the example in Figure 4.
When the system allocates job
3
to 4 cores and starts processing it, it is necessary to transmit the
blocks of job
3
from host h
2
to host h
1
. According to the actual condition, the time of job computing
starting is the time of transmission starting. For instance, 4 blocks of job
3
are allocated to 4 cores for
computing. We can compute b
3
1
and b
3
3
at local host directly, rather transmit b
3
0
and b
3
2
to host h
1
. The
transmission time for b
3
0
can be calculated as
size(b
3
0
)
s
t
=
10
40
= 0.25s.
Consequently, the time consumption of b
3
0
, also the processing time of core c
1
1
for job
3
, could be
computed as Equation (8).
tp
3
11
= 0.25 +
size(b
2
0
)
s
3
× g(e
3
)
= 0.25 +
10
12 × g(4)
= 0.25 +
10
12 × 0.7
= 1.440s.
(8)
It is the same as the above that some cores could be in idle state when nishing block computing.
Thus, the processing time of job
3
in this stage is the time consumption of b
3
3
, which is 3.571s. The whole
time consumption of this example is the latest nishing time of all jobs, which is calculated to be 6.238s.
Figure 5 shows the time consumption of this solution and the red dotted box expresses the idle state of
cores during the scheduling process.
Obviously, the time consumption of this solution could utilize resources as much as possible, which
means that the data transmission can aect the nal overall nishing time. However, what is remarkable
is that the solution given in Figure 5 may not be optimal for the example in Figure 4. Because idle state
indicates the resource waste and this solution can still be optimized.
Based on the above discussions and examples, please nish the following tasks:
1. Formalize the resource scheduling problem among multiple hosts as a programming pattern with
objective function and constraints.
2. Design an algorithm to solve this problem. First, describe your idea in detail, and then provide the
corresponding pseudo code. Also, please discuss the time complexity of your design.
3. Verify your algorithm using test data “task2 case1.txt” in input le folder and save your result in
the .txt le, named as “task2 case1 TeamNumber.txt” (e.g., “task2 case1 06.txt”). The input/output
format is xed in the reference codes. Optionally, visualize your result in an unlimited format. For
example, you can plot a gure from the perspective of cores on the host. The test data and reference
codes are also released on GitHub: Resource Scheduling Problem.
Page 6 of 7
CS7310-Algorithm@SJTU Group Project Instructor: Xiaofeng Gao
4 Report Requirements
You need to submit a report for this project, with the following requirements:
1. Your report should include the title, the author names, IDs, email addresses, the page header, the
page numbers, your results and discussions for the tasks, gures for your simulations, tables for
discussions and comparisons, with the corresponding gure titles and table titles.
2. Your report should be in English only, with a clear structure, divided by sections, and may contain
organizational architecture such as items, denitions, or discussions.
3. Please include Reference section and Acknowledgment section. You may also include your feelings,
suggestions, and comments in the acknowledgment section.
4. Please dene your variables clearly and add them into the symbol table in Appendix.
5. Please create a folder named “Project-TeamNumber”� which contains related materials such as report
“Project-TeamNumber.pdf”, latex source “Project-TeamNumber.tex”, the executable le “Project-
TeamNumber.exe” (if you have), the data output folder “Project-Outputs-TeamNumber”, the g-
ure folder “Project-Figures-TeamNumber”, and other code le folder “Project-Codes-TeamNumber”.
Then compress the home folder “Project-TeamNumber” into a “Project-TeamNumber.zip” package.
Note that TeamNumber should be two-digit number, e.g., “Project-06.zip” conforms to the rule.
Acknowledgements
This problem is motivated from a real-world cloud service corporation. It is formulated by Prof.
Xiaofeng Gao (gao-xf@cs.sjtu.edu.cn) from Department of Computer Science and Engineering at Shanghai
Jiao Tong University. Yue Ma scribed and modied the project. Yangyang Bao provided the example in
Task 1. Zhen Sun provided the example in Task 2. Wanghua Shi provided the test data and source code.
Jiale Zhang, Tianyao Shi helped on proofreading and provided many suggestions.
Appendix
Table 1. Symbols and Denitions
Symbols Denitions
n The number of jobs
m The number of cores
q The number of hosts
job
i
, J job
i
is the i-th job. The job set is J = {job
0
, · · · , job
n1
}.
h
l
, H h
l
is the l-th host. The host set is H = {h
0
, · · · , h
q1
}.
m
l
The number of cores on host h
l
c
l
j
, C
l
c
l
j
is the j-th core on host h
l
. C
l
is the set of cores on host h
l
.
C The set of cores. C = {c
0
, · · · , c
m1
} for single-host. C =
q1
l=0
C
l
for multi-host.
b
i
k
The block of job
i
whose id is k
B
i
j
The set of data blocks of job
i
allocated to core c
j
B
i
The set of data blocks of job
i
B
i
lj
The set of data blocks of job
i
allocated to core c
l
j
e
B
i
lj
The set of data blocks of job
i
allocated to core c
l
j
but not initially stored on h
l
size(·) The size function of data block
g(·) The computing decaying coecient caused by multi-core eect
s
i
The computing speed of job
i
by a single core
s
t
The transmission speed of data
e
i
The number of cores processing job
i
t
i
The time to start processing job
i
tp
i
j
, tf
i
j
The processing time / nishing time of core c
j
for job
i
tp
i
lj
, tf
i
lj
The processing time / nishing time of core c
l
j
for job
i
tf(job
i
) The nishing time of job
i
Page 7 of 7